Clustering rows and/or columns of a two-way contingency table and a related distribution theory

نویسنده

  • Chihiro Hirotsu
چکیده

The row-wise multiple comparison procedure proposed in Hirotsu [Hirotsu, C., 1977. Multiple comparisons and clustering rows in a contingency table. Quality 7, 27–33 (in Japanese); Hirotsu, C., 1983. Defining the pattern of association in two-way contingency tables. Biometrika 70, 579–589] has been verified to be useful for clustering rows and/or columns of a contingency table in several applications. Although the method improved the preceding work there was still a gap between the squared distance between the two clusters of rows and the largest root of a Wishart matrix as a reference statistic for evaluating the significance of the clustering. In this paper we extend the squared distance to a generalized squared distance among any number of rows or clusters of rows and dissolves the loss of power in the process of the clustering procedure. If there is a natural ordering in columns we define an order sensitive squared distance and then the reference distribution becomes that of the largest root of a non-orthogonalWishartmatrix, which is very difficult to handle. We therefore propose a very nice χ2-approximation which improves the usual normal approximation in Anderson [Anderson, T.W., 2003. An Introduction to Multivariate Statistical Analysis. 3rd ed. Wiley Intersciences, New York] and also the first χ2-approximation introduced in Hirotsu [Hirotsu, C., 1991. An approach to comparing treatments based on repeated measures. Biometrika 75, 583–594]. A two-way table reported by Guttman [Guttman, L., 1971. Measurement as structural theory. Psychometrika 36, 329–347] and analyzed by Greenacre [Greenacre, M.J., 1988. Clustering the rows and columns of a contingency table. Journal of Classification 5, 39–51] is reanalyzed and a very nice interpretation of the data has been obtained. © 2009 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Latent Block Model for Contingency Table

Although many clustering procedures aim to construct an optimal partition of objects or, sometimes, of variables, there are other methods, called block clustering methods, which consider simultaneously the two sets and organize the data into homogeneous blocks. This kind of methods has practical importance in a wide of variety of applications such as text and market basket data analysis. Typica...

متن کامل

Decomposition of Contingency Table using Expected Values

A contingency table summarizes the conditional frequencies of two attributes and shows how these two attributes are dependent on each other with the information on a partition of universe generated by these attributes. Thus, this table can be viewed as a relation between two attributes with respect to information granularity. This paper focuses on decomposition of a contingency matrix by using ...

متن کامل

Contingency Matrix Theory – Investigation of Information Granules in Statistics –

This paper focuses on how statistical independence can be observed in a contingency table when the table is viewed as a matrix. Statistical independence in a contingency table is represented as a special form of linear dependence, where all the rows or columns are described by one row or column, respectively.

متن کامل

Non-parametric latent modeling and network clustering

The paper exposes a non-parametric approach to latent and co-latent modeling of bivariate data, based upon alternating minimization of the Kullback-Leibler divergence (EM algorithm) for complete log-linear models. For categorical data, the iterative algorithm generates a soft clustering of both rows and columns of the contingency table. Well-known results are systematically revisited, and some ...

متن کامل

Co-SOFT-Clustering: An Information Theoretic Approach to Obtain Overlapping Clusters from Co-Occurrence Data

Co-clustering exploits co-occurrence information, from contingency tables to cluster both rows and columns simultaneously. It has been established that co-clustering produces a better clustering structure as compared to conventional methods of clustering. So far, co-clustering has only been used as a technique for producing hard clusters, which might be inadequate for applications such as docum...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 53  شماره 

صفحات  -

تاریخ انتشار 2009